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Summary 


This  report  identifies  the  available  computer  tests  that  are  designed  to  assess  the  perfonnance 
of  multiple  persons  engaged  in  shared  information  processing  tasks.  The  purpose  of  this  report  is 
to  examine  the  literature  concerning  group  cognitive  performance  measures  and  consider  their 
potential  for  military  research  relevant  to  a  small  team,  such  as  an  infantry  squad  or  the  aircrew 
of  an  aircraft.  Tests  were  favored  which  were  quantitative  and  automated.  Additionally,  this 
review  focused  on  tests  which  assessed  real-time  information  processing  (rather  than  physical 
performance  or  managerial  skill),  and  which  could  be  generalized  across  more  than  one  mission, 
platform,  or  experiment.  Five-hundred  seventy-one  potentially-relevant  abstracts  were  reviewed; 
of  which  73  full  text  articles  were  deemed  relevant  and  54  candidate  measures  of  team 
performance  were  identified.  After  further  review,  seven  tests  were  selected  as  the  most 
appropriate  to  future  military  research  within  the  stated  scope  of  this  report  (as  described  in  the 
Methods  section).  The  seven  tests  of  greatest  interest  were  (listed  in  no  order  of  preference): 
Tactical  Navy  Decision-Making  System  (TANDEM),  Team  Performance  Assessment 
Technology  (TP AT),  Team  Interactive  Decision  Exercise  for  Teams  Incorporating  Distributed 
Expertise  (TIDE  ),  C  (Command,  Control,  &  Communications)  Interactive  Task  for  Identifying 
Emerging  Situations  (NeoCITIES),  Distributed  Dynamic  Decision  Making  (DDD),  Agent 
Enabled  Decision  Group  Environment  (AEDGE),  and  Duo  Wondrous  Original  Method  Basic 
Awareness/Airmanship  Test  (DuoWOMBAT).  The  characteristics,  strengths,  and  potential 
limitations  of  each  test  are  discussed  briefly  in  this  report  and  references  are  provided  for  further 
information.  Each  of  these  seven  tests  is  designed  to  be  relevant  to  the  performance  of  military  or 
paramilitary  crews  or  teams.  Only  one  test  (NeoCITIES)  is  not  designed  specifically  for  military 
applications,  but  it  was  included  because  it  has  many  desirable  features  and  it  is  suitable  for 
paramilitary  (e.g.,  police)  situations  and  for  scenarios  relevant  to  national  defense,  such  as 
simulating  a  coordinated  emergency  response  to  terrorist  attacks  on  civilian  centers.  The  test 
most  similar  to  the  rudimentary  aspects  of  flight  control  tasks  engaged  in  by  military  aviation 
crewmembers  is  the  DuoWOMBAT.  The  other  six  tests  (TANDEM,  TP  AT,  TIDE2,  NeoCITIES, 
DDD,  and  AEDGE)  focused  on  various  aspects  of  team  perfonnance  most  relevant  to 
command/control  situations,  such  as  handling  threats  and  allocating  resources.  The  tests  which 
were  judged  as  most  likely  to  be  relevant,  readily  available,  widely/recently  used,  and  relatively 
mature  in  terms  of  validation  included  NeoCITIES,  DDD,  and  DuoWOMBAT.  Of  these, 
AEDGE,  DDD,  and  DuoWOMBAT  are  clearly  available  for  immediate  purchase.  The 
Warfighter  Health  Division  of  the  U.S.  Army  Aeromedical  Research  Laboratory  (USAARL) 
currently  owns  a  copy  of  DDD  and  DuoWOMBAT,  which  have  been  chosen  to  fill  past  or 
current  research  needs.  A  new  test  in  development  (C3Conflict)  was  identified  after  the 
completion  of  literature  gathering  for  this  review;  it  appears  to  have  many  desirable  features  and 
should  be  considered  further  as  more  validation  work  is  done.  Future  research  on  military  and 
paramilitary  team  performance  should  consider  the  information  in  this  report  when  seeking  to 
identify  the  tests  most  appropriate  to  the  specific  needs  of  the  scientific  effort  being  planned. 
Further  use,  refinement,  validation,  and  comparison  of  the  existing  automated  group  performance 
measures  are  encouraged. 
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Introduction 


Coordinated  team  performance  is  important  to  the  success  and  safety  of  military  personnel  in 
nearly  all  of  the  missions  they  are  asked  to  perform.  Efficient  shared  processing  of  information  is 
a  critical  feature  of  operations  performed  by  units  of  infantry,  armor,  artillery,  aviation,  special 
forces,  logistics,  medical  services,  military  intelligence,  and  communications  personnel.  The 
harmful  effect  of  inaccurate  shared  cognition  is  well-documented  as  a  contributing  factor  to 
aircraft  mishaps  (Salas  et  ah,  2001;  Katz  et  ah,  2006)  and  command/control-related  accidents 
(Armstrong,  1994). 

Research  on  human  performance  traditionally  focused  on  the  individual,  but  has  gradually 
expanded  from  a  consideration  of  individual  information  processing  to  recognition  of  the  role  of 
shared  cognition  (Ntuen,  2006).  As  a  result,  computerized  measures  of  team  performance  are 
being  developed  to  quantify  shared  information  processing.  Good  measures  of  team  performance 
would  benefit  the  military  by  allowing  it  to  detennine  the  characteristics  of  good  teamwork  and 
evaluate  the  effectiveness  of  team  training  methods  (Baker  and  Salas,  1997).  Additionally,  good 
measures  of  team  performance  should  aid  the  development  of  cost-effective  training  simulations 
(Simpson  and  Oser,  2003). 

Although  several  computerized  tests  have  been  developed  to  measure  military-relevant  team 
performance,  no  test  obviously  dominates  the  field  of  inquiry  in  the  way  that  measurement  of 
individual  perfonnance  has  tended  to  be  dominated  by  such  tests  as  those  found  within  the 
Unified  Tri-Service  Cognitive  Performance  Assessment  Battery  (UTC-PAB)  (Reeves  and 
Thorne,  1986;  Englund  et  ah,  1985).  There  is  limited  infonnation  concerning  which  team 
performance  tests  are  optimal  for  military  small  group  research,  and  few  reports  have  been 
published  describing  and  comparing  the  available  tests.  The  most  recent  comparative  review  was 
written  five  years  ago  by  Go,  Bos,  and  Lamoureux  (2006),  who  thoroughly  reviewed  44  potential 
test  platforms.  Prior  to  that,  Banner  (2004),  reviewed  seven  team  performance  tests,  while 
Bowers  and  Jentsch  (2001)  reviewed  the  suitability  of  36  commercial  computer  games  for  use  in 
team  perfonnance  research. 

The  limited  number  and  recency  of  comparative  information  on  computerized  team 
performance  tests  contributes  to  the  present  lack  of  uniformity  in  the  measurement  of  military 
team  perfonnance  and  limits  comparisons  across  studies.  For  these  reasons,  a  literature  review 
was  performed  to  assess  the  cunent  state  of  team  perfonnance  measures  and  identify  those  most 
suitable  for  military  research.  This  review  focused  on  low-to-medium  fidelity  systems,  since 
they  tend  to  be  more  commonly-used  and  widely  disseminated  than  high-end,  custom-built, 
“one-of-a-kind”  simulation  facilities  (Banner,  2004;  Dahlstrom  et  ah,  2009;  Jentsch  and  Bowers, 
1998;  Bowers  and  Jentsch,  2001),  due  to  their  relative  affordability,  portability,  configurability, 
and  ease  of  adoption. 

A  number  of  approaches,  including  surveys,  behavioral  checklists,  and  computer  tests  or 
simulations,  have  been  used  to  quantify  team  performance  (Brannick,  Salas,  and  Prince,  1997). 
This  report  focused  solely  on  quantitative  computerized  tests  of  team  performance  on  shared 
information-processing  tasks.  The  purpose  of  this  report  was  to  identify  a  subset  of  the  most 
suitable  automated  measures  of  team  performance  for  use  in  military  research.  This  report  was 
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not  intended  as  a  comprehensive  review  of  the  field  of  team  research  or  the  various  issues 
surrounding  measurement  of  team  performance,  but  rather  as  a  practical  guide  to  the  available 
computerized  team  perfonnance  metrics.  The  literature  abounds  with  reviews  of  the  history  and 
state  of  team  performance  research,  discussions  of  team  coordination  and  team  training 
approaches,  discussions  of  problems  of  definition  in  team  perfonnance  research,  theoretical 
discussions  of  the  essential  nature  and  aspects  of  team  performance,  working  models  of  team 
performance,  discussions  of  the  importance  of  measuring  team  performance,  descriptions  of  the 
problems  and  issues  to  be  considered  when  attempting  to  measure  team  perfonnance,  and 
recommendations  concerning  how  to  develop  measures  of  team  performance  (e.g.,  what  a  good 
measure  should  be  able  to  do).  This  report  attempts  to  address  a  less-frequently  discussed 
question,  which  can  be  phrased  as  follows:  “What  automated  small  team  performance  measures 
exist  which  I  should  consider  applying  to  my  military  research  studies?”  This  report  seeks  to 
answer  that  question  by  identifying  the  most  appropriate  tests  for  this  purpose.  It  was  not  the 
purpose  of  this  review  to  identify  the  “single  best”  test.  Rather,  the  purpose  of  this  review  was  to 
produce  a  short  list  of  the  most  suitable  tests  for  research  on  military  team  performance,  in  hopes 
that  future  research  activity  would  become  more  focused  and  the  list  narrowed  down  further  by 
the  subsequent  efforts  of  the  research  community.  Greater  standardization  of  team  performance 
testing  protocols  would  be  of  benefit  to  the  military  research  community. 


Methods 


A  literature  search  was  conducted  for  articles  published  in  1990,  or  later.  The  searched 
databases  included  Defense  Technical  Information  Center,  Psyclnfo,  PubMed,  and  Anny 
Research  Laboratory  (ARL)  library  online.  The  primary  search  terms  were  “team”  and 
“performance.”  Additional  search  tenns  were  discussed  with  professional  librarians,  who 
assisted  with  the  literature  search.  The  librarians  based  their  search  on  an  idealized  set  of  criteria 
exemplified  by  the  following  compound  statement  devised  by  the  first  author: 

[Objective  /  Performance-Based  /  Quantitative  /Automated  /  Computer  /  Computerized. . . 

(NEAR) 

. .  .Measure  /  Metric  /  Battery  /  Test  /  Assessment  /  Task  /  Technology] 

(AND) 

[(Team  /  Group  /  Shared  /  Squad  /  Crew  /  Cockpit)  Performance. . . 

(NEAR) 

. .  .Effectiveness  /  Cognition  /  Cognitive  /  Decision  /  Attention  /  Coordination  /  Resource 

Management] 

A  team  of  three  research  psychologists  (the  report  authors)  reviewed  the  resulting  literature 
obtained  by  the  search.  The  authors  excluded  those  items  that  were  not  clearly  related  to  the 
focus  of  this  review.  The  qualitative  exclusion  criteria  included  the  following: 

a.  Items  were  excluded  which  did  not  describe  the  measurement  of  team  cognition  or 
performance. 
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b.  Items  were  excluded  which  dealt  solely  with  business/management  issues,  such  as  team 
building,  personnel  selection,  or  leadership  (i.e.,  we  excluded  reports  where  such  variables  were 
not  part  of  a  group  of  other  relevant  measures  being  assessed  directly  by  the  group’s  keyable 
responses  during  a  computerized  performance  test). 

c.  Items  were  excluded  concerning  physical  performance  studies  (e.g.,  strength,  endurance)  if 
they  did  not  also  include  a  cognitive  or  psychomotor  aspect. 

d.  This  review  sought  to  identify  computerized  performance  tests  which  could  be  adapted  to 
current  research  efforts,  so  literature  prior  to  1990  was  excluded,  since  articles  before  that  time 
would  have  preceded  the  mainstream  use  of  personal  computers,  graphical  user  interfaces  for 
Microsoft  ©  Windows,  hyper  text  markup  language,  etc.  However,  in  a  few  cases,  some 
important  post- 1990  articles  were  obtained  on  team  performance  tests  which  referred  back  to 
initial  development  efforts  of  the  same  test  shortly  before  1990,  in  which  case  we  provide  those 
earlier  references  in  this  report. 

A  collection  of  abstracts  or  articles  was  identified  for  further  evaluation.  After  evaluation  of 
these  items,  full  text  articles  were  obtained  of  the  most  relevant  items  selected  for  a  second  round 
of  more  detailed  evaluation.  The  first  and  second  round  of  evaluations  gave  special  preference  to 
articles  which  came  closest  to  meeting  the  following  inclusion  criteria: 

a.  Reports  were  preferred  which  described  systematic,  quantitative,  computerized 
performance  tasks  (as  distinguished  from  surveys,  observer-based  methods,  task  analyses, 
theoretical  papers,  computer  games,  or  training  improvements  or  guidelines). 

b.  Reports  were  preferred  which  described  military-relevant  tasks  and  measures  (vice 
measures  only  applicable  to  non-military  situations  or  general  measures  of  cognitive  state). 
Greatest  consideration  was  given  to  tasks/measures  which  appeared  to  be  part-task  or  medium- 
fidelity  simulations  similar  to  many  tasks  which  must  be  perfonned  by  small  military  groups 
(e.g.,  squads,  aircrew). 

c.  Reports  were  preferred  which  were  relevant  to  small  groups  (vice  entire  agencies  or 
companies);  with  a  small  group  being  defined  as  10  people  or  fewer  (4  to  10  people  are 
considered  a  “squad”  by  the  military). 

d.  Reports  were  preferred  which  were  relevant  to  real-time  shared  infonnation  processing  (as 
distinguished  from  long-range  or  strategic  planning).  For  example,  the  first  author  would  include 
any  report  concerning  the  speed  and  accuracy  of  a  small  group’s  ability  to  detect  target  events 
while  communicating  and  coordinating  an  appropriate  response. 

e.  Reports  were  preferred  which  described  generalizable  tasks  or  tests  not  limited  to  one 
mission,  platform,  experiment,  project,  facility,  or  course  of  training/simulation.  For  example,  we 
excluded  unique  simulations  of  high  complexity  and  cost  (such  as  the  150-person  Virtual 
Warfare  Center)  which  are  likely  to  exist  in  only  one  or  two  places.  Similarly,  training-only 
applications  or  games  with  limited  scoring  outputs  that  are  not  intended  primarily  for  scientific 
research  were  excluded  (e.g.,  the  Army’s  laptop-based  OneSAF  ground  warfare  training  game). 
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f.  We  did  not  seek  to  identify  every  generation  of  a  given  test  throughout  its  history  of 
development.  Rather,  we  included  only  the  more  recent  versions.  For  example,  NeoCITIES 
derived  from  CITIES,  but  we  only  list  NeoCITIES  in  this  review.  Similarly,  TANDEM  is 
derived  from  the  earlier  Team  Performance  Assessment  Battery  (TPAB),  and  both  the  TPAB  and 
the  Team  Resource  Management  Test  (REMAN)  are  derived  from  still  earlier  tests  such  as  the 
Multiple  Task  Performance  Battery  (MTPB)  and  Distributed  Resource  Allocation  and 
Management  (DREAM)  task  (Bowers,  Urban,  and  Morgan,  1992).  For  the  purposes  of  this 
review,  we  have  listed  TANDEM,  TPAB,  and  REMAN  below,  but  not  the  earliest  tests  such  as 
MTPB  or  DREAM.  It  should  be  noted  that  this  type  of  exclusion  was  rare,  since  most  tests  have 
kept  their  original  name  as  they  were  modified  or  have  only  changed  names  once. 


Results  and  Discussion 

Round  1  findings:  Identification  of  an  initial  list  of  potential  tests 

During  the  initial  literature  search,  571  abstracts  or  articles  were  identified  for  preliminary 
review,  from  which  73  full  text  articles  were  deemed  to  merit  a  second  round  of  more  detailed 
review.  Among  these  73  reports,  54  potential  team  performance  tests  were  identified.  This  full 
list  of  potential  measures  is  shown  below.  The  reader  should  note  that  many  of  the  items  below 
did  not  turn  out  to  be  computerized  tests  of  team  performance  once  the  exclusion  and  inclusion 
criteria  were  more  carefully  assessed  during  further  review  (see  Round  2  findings  below).  Below 
are  the  names  of  each  item  and  one  or  more  resource  citations. 

a.  Team  Performance  Assessment  Battery  (TPAB) 

( 1 )  Bowers,  Urban,  and  Morgan  (1992) 

(2)  Urban  et  al.  ( 1 995) 

(3)  Schraagen  and  Rasker  (2003) 

b.  Team  Performance  Assessment  Simulation  (TPAS) 

(1)  Swezey  et  al.  (1998) 

c.  Team  Performance  Assessment  Technology  (TP AT) 

(1)  Swezey,  Hutcheson,  and  Swezey  (2000) 

(2)  Lamoureux  et  al.  (2006)  (see  appendix  p  A- 13) 

d.  Tactically  Relevant  Assessment  of  Combat  Teams  (TRACTs) 

(1)  Fowlkes  et  al.  (1999) 

e.  Team  Resource  Management  Test  (REMAN) 

(1)  Woldring  and  Issac  (1999) 

(2)  Hintz  (2011) 

f.  Tactical  Navy  Decision-Making  System  (TANDEM) 

(1)  Canty  and  Schwab  (2001) 
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(2)  Dwyer  et  al.  (1992) 

(3)  Van  Berio  (2004) 

(4)  Lenox  et  al.  (1999) 
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g.  Decision-making  Evaluation  Facility  for  Tactical  Teams  (DEFTT) 

(1)  Johnston,  Poirier,  and  Smith- Jentsch  (1998) 

h.  Tactical  Decision-Making  Under  Stress  (TADMUS) 

(1)  Cannon-Bowers  and  Salas  (1998) 

(2)  Johnston,  Cannon-Bowers,  and  Salas  (1998) 

i.  Adaptive  Architectures  for  Command  and  Control  (A2C2) 

(1)  Entin  and  Entin  (2000) 

j.  Tactically  Relevant  Assessment  of  Combat  Events  (TRACE) 

(1)  McCluskey  et  al.  (1998) 

k.  Team  Interactive  Decision  Exercise  for  Teams  Incorporating  Distributed  Expertise 
(TIDE2) 

( 1 )  Hollenbeck  et  al.  (1991) 

(2)  Hollenbeck  et  al.  (1997) 

(3)  Hollenbeck  et  al.  ( 1 995) 

(4)  Lamoureux  et  al.  (2006)  (see  appendix  p.  A- 17) 

l.  Targeted  Acceptable  Responses  to  Generated  Events  or  Tasks  (TARGETs) 

(1)  Fowlkes  et  al.  (1992) 

(2)  Fowlkes  et  al.  (1994) 

m.  The  Army  Command  and  Control  Evaluation  System  (ACCES) 

(1)  Hayes,  Layton,  and  Ross  (1993) 

(2)  Halpin  (1996) 

(3)  Essens  et  al.  (2005) 

n.  NeoCITIES  (derived  from  CITIES) 

(1)  Jones  et  al.  (2004) 

(2)  McNeese  et  al.  (2005) 

o.  Hierarchical  Task  Analysis  (Teams)  (HTA[T]) 

(1)  Annett,  Cunningham,  and  Mathias-Jones  (2000) 

p.  Duo  Wondrous  Original  Method  Basic  Awareness/Airmanship  Test  (DuoWOMBAT) 
(1)  Breton,  Tremblay,  and  Banbury  (2007) 

q.  Tactical  Simulation  System  (TACSIM) 

(1)  Tactical  simulation  system  (TACSIM)  (2011) 

(2)  Go,  Bos,  and  Lamoureux  (2006)  (see  appendix  p.  A- 10) 

r.  Distributed  Dynamic  Decision  Making  (DDD) 

(1)  Galster,  Nelson,  and  Bolia  (2005) 

(2)  Aptima,  Inc.  (2005) 
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s.  DDD-III  Simulator:  a)  North  African  “Insertion  from  the  Sea;”  b)  Joint 
Command/Control;  c)  Joint  Task  Force;  d)  Joint  Task  Force  Group 

(1)  Wollenbecker  et  al.  (1999) 

(2)  Hocevar  et  al.  (1999) 

(3)  Levchuk  et  al.  (1999) 

(4)  Lamoureux  et  al.  (2006)  (see  appendix  p.  A-21;  A-30;  A-33;  A-35) 

t.  DDDnet,  Airborne  Warning  and  Control  System  (AWACS)  Weapon  Director  Teams 

(1)  Barnes,  Elliott,  and  Entin  (2001) 

(2)  Lamoureux  et  al.  (2006)  (see  appendix  p.  A-3 1) 

u.  Agent  Enabled  Decision  Group  Environment  (AEDGE)1 

(1)  Barnes  et  al.  (2004) 

(2)  Barnes  et  al.  (2004) 

(3)  Elliot  et  al.  (2002) 

(4)  Barnes,  Petrov,  and  Elliott  (2002) 

v.  Cognitive  Engineering  Research  on  Team  Tasks  (CERTT) 

(1)  Cooke  and  Shope  (2005) 

(2)  Cooke  (2002) 

(3)  Go,  Bos,  and  Lamoureux  (2006),  (see  appendix  p.  A- 15) 

w.  TEAMSim 

(1)  DeShon  et  al.  (2004) 

(2)  DeShon,  Brown,  and  Greenis  (1996) 

x.  Air  Combat  Mission  Planning  (ACMP) 

( 1 )  Gaylord  and  Sowell  ( 1 992) 

y.  Anti-Air  Teamwork  Observation  Measure  (ATOM) 

(1)  Smith-Jentsch  et  al.  (1998) 

(2)  Shanahan  et  al.  (2007) 

(3)  Entin  and  Entin  (2001) 

z.  Low-Fidelity  Aviation  Research  Methodology  (LFARM) 

( 1 )  Bowers  et  al.  ( 1 992) 

aa.  Controller  Teamwork  Evaluation  and  Assessment  Methodology  (CTEAM) 

( 1 )  Bailey  et  al.  ( 1 999) 

bb.  Roboflag 

(1)  Funke  and  Galster  (2009) 

(2)  Guznov  et  al.  (201 1) 


1  This  test  is  referred  to  by  various  phrases  and  acronyms,  with  “AEDGE”  and  “GROUP”  being  most  common. 
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cc.  Space  Fortress 

(1)  Shebilskeetal.  (1999) 

dd.  Advanced  Disaster  Management  Simulator  (ADMS) 

(1)  This  system  is  described  at:  http://trainingfordisastennanagement.com/ 

(2)  Go,  Bos,  and  Lamoureux  (2006)  (see  appendix  p.  A12). 

ee.  Air  Operations  Centres  (AOC),  AWACS  in  the  Command,  Control,  and  Communications 
Simulation,  Training  and  Research  System  (C3STARS)  Facility 

(1)  This  system  is  described  at: 

a)  ww.hec.afrl.af.mil/Organization/HECP/AOC.asp 

b)  www .  me  sa .  afimc .  af .  mil/htmFc  3  stars .  htm 

(2)  Go,  Bos,  and  Lamoureux  (2006)  (see  appendix  p.  A-4;  A-5) 

ff.  NASA  Ames  Center  -  Distributed  Facilities 

(1)  Jonas (2008) 

(2)  National  Space  Biomedical  Research  Institute  (2010) 

(3)  Go,  Bos,  and  Lamoureux  (2006)  (see  appendix  p.  A-7) 

gg.  One  Semi-Automated  Forces  (OneSAF) 

(1)  OneSAF  Objective  System  is  described  at:  www.onesaf.org/onesaf.html 

(2)  Go,  Bos,  and  Lamoureux  (2006)  (see  appendix  p.  A-9) 

hh.  Virtual  Warfare  Centre  (VWC) 

(1)  Villaneuva  (2007) 

ii.  Synthetic  Task  Environment  (STE)  in  Cognitive  Engineering  Research  on  Team  Tasks 
(CERTT)  Lab 

(1)  Lamoureux  et  al.  (2006)  (see  appendix  p.  A-4) 

jj.  Team  and  Individual  Tactical  Assessment  Network  (TITAN) 

(1)  Blais,  Thompson,  and  Baranski  (2002) 

(2)  System  described  at:  http://ntt.ca/ 

kk.  Bolo,  Tank  Battle  Game 

(1)  Knight,  Durham,  and  Locke  (2001) 

11.  Dangerous  Waters,  Naval  Combat  Experience 

(1)  Commercial  site  for  Sonalysts  Combat  Simulations:  www.scs-dangerouswaters.com 

(2)  Go,  Bos,  and  Lamoureux  (2006)  (see  appendix  p.  A-29) 

mm.  Longbow  2,  Helicopter  Flight  Simulator 
(1)  Marks  et  al.  (2002) 

nn.  Team  Argus,  a  radar- like  classification  task 

(1)  Miller  (2001) 

(2)  Schoelles  and  Gray  (1998) 
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(3)  Lamoureux  et  al.  (2006)  (see  appendix  p  A- 19) 


oo.  Neverwinter  Nights:  a)  Recovering  weapons  from  hidden  caches;  b)  Capture  the  Flag 

(1)  The  commercial  website  is:  http://www.bioware.com/games/legacy 

(2)  Weil  et  al.  (2005) 

pp.  ATC  team  training  device 

(1)  Bailey  and  Thompson  (2000) 

qq.  Multi-agent  Operation  Range  Simulation  Environment  (MORSE) 

(1)  Rectenwald  et  al.  (2003) 

(2)  Sycara  et  al.  (2005) 

rr.  SCUDHunt 

(1)  Perla  et  al.  (2000) 

(2)  Holzworth  (2002) 

ss.  Wright  State  Aegis  Simulation  Platfonn  (WASP)/Team  Aegis  Simulation  Platform 
(TASP  -  an  extension  of  WASP) 

(1)  These  systems  are  described  at:  http://www2.ie.psu.edu/Rothrock/Research/HPAM/ 

(2)  Lamoureux  et  al.  (2006)  (see  appendix  p.  A-26) 

tt.  Janus  Wargaming  Simulation 
(1)  Chapman  et  al.  (2002). 

uu.  Networked  Fire  Chief  (NFC) 

(1)  Chapman  et  al.  (2002) 

(2)  Chapman  (2000) 

vv.  C3Fire 

(1)  Dube  etal.  (2010) 

(2)  Granlund  (2003) 

(3)  Persson  and  Worm  (2002) 

ww.  Microsoft  Flight  Simulator,  Pilots  (ATC) 

(1)  Brannick  et  al.  (1995) 

(2)  Lamoureux  et  al.  (2006)  (see  appendix  p.  A-34) 

xx.  Ruthless.com  (Red  Storm  Entertainment) 

(1)  Bowers  and  Jentsch  (2001) 

yy.  Fleet  Command  (Jane’s  Combat  Simulations).  Note:  Of  four  games  listed  by  Bowers  and 
Jentsch  (2001),  this  one  simulates  a  fairly  modem  and  realistic  combat  scenario  (involving  naval 
tactics). 
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(1)  Bowers  and  Jentsch  (2001) 

zz.  Half-Life  Team  Fortress  Classic  (Valve) 
(1)  Bowers  and  Jentsch  (2001) 


aaa.Antietam  (Fireaxis  Games). 

(1)  Bowers  and  Jentsch  (2001) 

bbb.  Beer  Distribution  Game 

(1)  Banbury  et  al.  (2010) 

(2)  Goodwin  and  Franklin  (1994) 

(3)  Geyer-Schulz  (1996) 

The  initial  list  of  candidate  items  above  were  evaluated  via  the  exclusion/inclusion  criteria  and 
narrowed  down  to  a  list  of  the  seven  most  relevant  tests,  which  are  described  below. 

Round  2  findings:  Identification  and  description  of  the  most  suitable  tests 

Several  trends  were  noticed  during  Round  2  of  the  literature  review.  First,  the  majority  of  the 
initial  literature  matching  the  stated  search  terms  (see  Methods)  led  to  business  or  management 
items,  many  of  which  were  opinion  pieces  of  a  philosophical,  inspirational,  or  otherwise  non- 
scientific  nature.  Second,  many  of  the  initial  hits  which  appeared  to  discuss  tests  (and  therefore 
to  be  pertinent  to  this  review)  did  not  actually  yield  full-text  articles  describing  generalizable 
tests  or  test  batteries,  but  rather,  descriptions  of  research  projects,  laboratory  facilities,  or  the 
general  problems  surrounding  the  measurement  of  team  cognition.  Third,  when  a  report 
described  a  potentially  relevant  team  performance  test  (or  tests),  practical  infonnation  was 
sometimes  insufficient  to  make  inferences  concerning  the  ease  of  test  administration,  time 
required  for  testing,  ease  of  access  (e.g.,  is  it  open-access  or  available  commercially  “off-the- 
shelf?”),  maturity  of  the  test  (e.g.,  is  it  widely  used,  established,  reliable,  and  valid?),  extent  of 
automated  and  objective  scoring,  generalizability,  and  configurability  (to  different  tasks  or  team 
sizes).  Additionally,  for  some  of  the  older  reports,  it  was  difficult  to  determine  (by  additional 
web  searches  or  e-mail  inquiries)  whether  the  test  was  compatible  with  the  latest 
hardware/software,  whether  there  was  continued  development  and  use  of  the  test,  whether  the 
report  described  the  latest  version  of  the  test,  or  whether  the  test  is  available  for  use  or  purchase. 
The  lack  of  practical  information  on  performance  tests  has  been  a  problem  for  human 
performance  measurement  in  general  and  has  made  applied  sources  such  as  the  Human 
Performance  Measures  Handbook  (Gawron,  2000)  particularly  useful. 

Despite  these  challenges,  the  full  list  of  more  than  50  potential  measures  was  considered 
against  the  aforementioned  qualitative  inclusion/exclusion  criteria  and  the  following  tests  were 
selected  unanimously  as  the  most  interesting  for  continued  evaluation  in  military  team 
performance  research  (tests  are  listed  in  random  order): 

a.  TANDEM 
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b.  TP  AT 


c.  TIDE2 

d.  NeoCITIES 

e.  DDD 

f.  AEDGE 

g.  Duo  WOMB  AT 

To  corroborate  this  review,  preliminary  findings  were  presented  at  two  professional 
conferences  (Estrada  et  al.  2010;  Lawson,  Kelley,  and  Athy,  2011)  attended  by  various  human 
performance  researchers,  including  people  knowledgeable  about  team  performance.  The 
audiences  were  invited  to  recommend  any  additional  tests  of  importance  which  may  have  been 
overlooked  by  this  review  effort.  No  additional  tests  were  suggested  to  the  authors  at  these  two 
conferences. 

Some  trends  were  noticed  concerning  the  seven  tests  which  were  selected  as  most  suitable  for 
military  research.  First,  they  tended  to  focus  on  the  general  perfonnance  of  the  team,  rather  than 
on  specific  and  established  aspects  of  neurological  or  cognitive  functioning.  Second,  almost  all 
of  the  tests  involved  simulations  of  command/control  tasks. 

Some  authors  or  laboratories  contributed  disproportionately  to  the  number  of  potential  tests  on 
our  initial  list  of  more  than  50  items.  In  such  cases,  only  the  author’s  most  recent,  active,  or 
mature  tests  were  selected  for  inclusion  in  the  final  list  of  seven.  For  example,  researchers  at  the 
Institute  for  Simulation  and  Training  at  the  University  of  Central  Florida  (and  their  colleagues) 
have  been  very  active  in  teamwork  research,  and  have  come  up  with  a  number  of  team 
performance  research  projects,  laboratories,  simulations,  testbeds,  team  tasks,  or  team  tests, 
including  DREAM,  LFARM,  TARGETS,  TPAB,  REMAN,  and  TANDEM.  Of  these,  the  authors 
selected  TANDEM  for  inclusion  in  the  final  list  of  seven  most  suitable  computerized  team 
performance  tests. 

The  authors  reviewed  the  tests  for  trends  concerning  the  most  common  independent  variables, 
and  found  that  workload  was  the  most  common  factor,  which  could  be  manipulated  by  the 
experimenter  during  administration  of  the  tests  (e.g.,  via  manipulation  of  the  number  of  stimuli 
to  keep  track  of  and/or  the  duration  of  time  to  respond). 

Below,  we  briefly  summarize  each  of  the  seven  most  interesting  tests,  providing  one  or  more 
suitable  references  for  further  reading,  describing  the  main  tasks  perfonned  by  the  subjects, 
providing  an  example  of  an  operational  scenario  relevant  to  the  test,  listing  the  main  variables 
being  studied  by  the  test  (i.e.,  measured  or  manipulated),  and  describing  the  potential  strengths 
and  limitations  of  the  test.  Since  the  amount  and  type  of  information  available  on  each  test  was 
not  the  same,  the  summary  information  about  each  test  will  vary  in  the  sections  below. 
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Tactical  Naval  Decision-Making  System  (TANDEM) 

Developed  by  the  Naval  Training  Systems  Center  (known  since  1993  as  the  Naval  Air 
Warfare  Training  Systems  Division),  TANDEM  is  a  low-fidelity  simulation  of  a  command, 
control,  and  communications  environment  (Dwyer  et  ah,  1992).  TANDEM  was  created  to 
simulate  a  military  combat  infonnation  center.  It  requires  subjects  to  identify  various  targets  and 
decide  upon  an  appropriate  response.  Following  target  identification,  the  subjects  are  to  perform 
a  “final  engagement”  which  involves  a  “shoot”  or  “clear”  decision  (Dwyer  et  ah).  The 
identification  of  the  target  is  made  by  various  indicators  that  may  clearly  define  the  target’s 
characteristics  or  may  be  ambiguous/contradictory,  requiring  the  crew  to  engage  in  further 
interpretation  and  coordination. 

When  the  simulation  begins,  several  unidentified  targets  are  presented  on  a  screen  similar  to 
that  found  on  a  radar  operator’s  display.  The  team  must  select  (or  “hook”)  a  target  of  interest, 
which  will  be  presented  with  potentially  important  information  concerning  the  target.  Typically, 
two  or  three  team  members  work  together  and  each  member  is  given  different  infonnation  about 
the  target.  For  example,  one  team  member  may  get  infonnation  on  the  target’s  type  (aircraft, 
ship,  or  submarine),  another  on  the  target’s  classification  (military  or  civilian),  and  a  third  on  the 
target’s  likely  intent  (peaceful  or  hostile)  (Lenox  et  ah,  1999).  From  this  information,  either  a 
team  vote  or  a  team  leader  will  detennine  what  the  final  engagement  decision  will  be  (depending 
on  the  scenario).  Following  engagement,  the  target  is  typically  removed  from  the  screen. 
Performance  measures  include  such  variables  as  the  “hook  time”  (how  long  a  target  is  selected), 
accuracy  of  labeling  of  the  target,  and  the  number  of  targets  engaged,  but  variations  of  the  task 
may  include  other  measures  such  as  whether  a  target  reached  a  dangerous  proximity  region 
(represented  by  a  small  inner  circle  within  the  display). 

System  requirements  for  TANDEM  are  very  basic.  It  is  personal  computer-based  and 
programmed  in  C++.  TANDEM  requires  at  least  a  640  by  480  video  graphics  array  (VGA)  based 
graphics  card,  a  20  megabyte  hard  drive,  640  kilobytes  of  random-access  memory  (RAM),  a 
Logitech  three-button  trackball  (also  works  with  a  mouse)  with  4.01  driver  (or  equivalent),  and 
Microsoft  DOS  version  3.3  or  later.  The  TANDEM  task  measures  the  participant’s  memory, 
decision  making  ability,  and  the  interdependence  of  team  members,  and  can  vary  the  task  by 
influencing  variables  such  as  the  overall  workload  and  the  ambiguity  of  the  information 
presented.  TANDEM  is  fully  automated,  configurable,  and  is  militarily  relevant.  TANDEM 
lacks  a  team-of-teams  capability,  and  according  to  Weaver  et  al.  (1995),  its  largest  potential 
shortcoming  is  a  failure  to  require  the  integration  of  dynamic  information  over  time. 
Nevertheless,  this  limitation  could  make  the  test  attractive  for  those  researchers  wishing  to  do  a 
simple  study  wherein  stimuli  and  communications  are  more  limited  and  controlled.  Several 
TANDEM  study  authors  were  contacted,  but  no  information  could  be  obtained  concerning 
purchasing  or  otherwise  obtaining  the  TANDEM  software.  As  a  government-sponsored  product 
developed  for  a  Department  of  Defense  (DoD)  laboratory,  it  may  be  available  without  cost  for 
legitimate  research  uses  by  government  personnel. 

Summary  of  TANDEM 
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a.  Recommended  reading:  Dwyer  et  al.  (1992);  Canty  and  Schwab  (2001);  and  Cannon- 
Bowers  and  Salas  (1998). 

b.  Team  task:  Command/control;  two  to  three  participants;  members  determine  type,  threat 
level,  and  intent  of  each  contact. 

c.  Scenario:  Military  combat  information  center. 

d.  Main  factors  studied:  Decision-making,  including  interdependence,  time  pressure, 
workload,  and  ambiguity. 

e.  Strengths:  Automated;  configurable;  generalizable;  militarily-relevant. 

f.  Potential  limitations:  Relatively  low  fidelity  (does  not  require  integration  of  changing 
information  over  time);  limited  on  specific  aspects  of  cognitive  ability  or  shared  knowledge;  no 
“team-of-teams”  capability;  availability  uncertain. 

Team  Performance  Assessment  Technology  (TP AT) 

Developed  by  InterScience  America,  TP  AT  assesses  individual  and  team  performance  in  a 
group  task.  TP  AT  measures  some  higher-order  activities,  unlike  many  other  team  tasks  that  are 
based  more  on  the  learning  of  patterns  or  rules.  Training  is  provided  by  tutorials  that  require  the 
user  to  reach  a  criterion  of  understanding.  Once  all  team  members  reach  their  baseline,  the  main 
task  may  begin. 

The  scenario-based  task  involves  a  team  of  individuals  that  must  deal  with  constant  changing 
of  information  over  time,  for  example,  cooperating  to  extract  a  hostage  from  a  fictitious  hostile 
nation  or  cooperate  as  nuclear  power  plant  technicians.  Each  person  has  control  over  a  certain 
aspect  of  the  task,  and  as  events  unfold,  the  team  is  required  to  make  decisions  on  what  steps 
should  be  taken  next.  Besides  the  events  that  always  occur,  TP  AT  also  uses  the  decisions  made 
by  the  teams  to  create  new  situations,  thus  requiring  new  decisions  to  be  made.  This  allows  for  a 
quasi-experimental  design.  Overall,  over  1000  decision  alternatives  exist  (Swezey,  Hutcheson, 
and  Swezey,  2000). 

Teams  consist  of  up  to  nine  team  members,  with  the  nine  members  generally  split  into  three 
separate  units  of  three  individuals.  A  command  structure  exists  within  the  units  and  for  the  entire 
group  of  units.  If  nine  team  members  are  not  available,  the  TP  AT  program  is  able  to  replicate 
any  missing  team  members,  allowing  testing  of  one  to  nine  individuals  in  this  task.  Information 
is  provided  to  team  members  via  computer  or  team  member  messages  concerning  events 
occurring  within  the  individual’s  domain  of  labor.  Based  on  these  messages,  team  members  are 
required  to  make  preliminary  decisions  on  what  should  be  done.  While  making  these  decisions, 
team  members  are  to  indicate  what  earlier  events  influenced  the  decisions  they  are  making  and 
what  possible  future  plans  they  may  have.  TPAT  records  the  decisions  acted  upon  and  how  they 
connect  to  previous  events,  letting  researchers  witness  what  led  to  a  team’s  decision,  and  what 
future  plans  were  not  acted  upon. 
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Scores  are  given  for  the  team’s  performance  and  for  the  perfonnance  of  each  team  member. 

No  knowledge  concerning  computer  programming  is  required  to  score  TPAT,  as  the  output 
software  of  the  program  provides  a  detailed  evaluation  of  the  participants.  A  total  of  53  scores 
are  provided,  which  include  several  performance  factors  (e.g.,  decision-making,  planning, 
strategy  development)  and  social  psychological  traits  (e.g.,  communication,  cohesion).  System 
requirements  for  TPAT  are  very  basic,  and  require  networked  Microsoft©  Windows  95  or  98 
compatible  computers.  Unfortunately,  the  first  author  of  TPAT  (Robert  Swezey)  died  in  2002 
(Van  Cott,  2004)  and  little  information  is  available  about  the  company,  InterScience  America.  At 
this  time,  TPAT  may  not  be  available  or  used  extensively  in  research. 
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Summary  of  TP  AT 


a.  Recommended  reading:  Swezey,  Hutcheson,  and  Swezey  (2000). 

b.  Team  task:  Three  teams  of  three  each;  controlling  command;  air  resources;  and  ground 
resources. 

c.  Scenario:  One  example  is  search/rescue  and  hostage  extraction. 

d.  Factors  studied:  Decision-making;  planning;  strategy;  situation  awareness;  initiative; 
communication;  cohesion;  leadership;  task  difficulty;  task  performance. 

e.  Strengths:  Assesses  group  and  individual  perfonnance;  includes  >  50  measures; 
accommodates  up  to  nine  users;  has  team-of-teams  capability. 

f.  Potential  limitations:  First  author  deceased  and  little  infonnation  available  about  this 
company.  Uncertain  if  this  test  is  still  being  actively  developed/supported  or  sold  commercially. 
The  last  available  contact  information  for  InterScience  America  (c.  1998)  was  703-779-8090, 
Sterling,  VA  (also  listed  as  being  located  in  Leesburg,  VA).  When  the  first  author  called  this 
telephone  number,  he  reached  what  sounded  like  a  home  phone,  as  inferred  from  the  answering 
machine  message  left  by  a  small  child. 

Team  Interactive  Decision  Exercise  for  Teams  Incorporating  Distributed  Expertise  (TIDE") 

Developed  by  Hollenbeck  et  al.  (1991),  the  TIDE2  is  a  low-fidelity  simulation  of  requiring 
classification  of  targets  in  ambiguous  situations.  The  task  relies  heavily  on  Brunswik’s  lens 
model  (Brunswick,  1940,  1943,  1955,  and  1956)  for  individual  decision  making,  but  instead  of 
providing  all  infonnation  to  one  individual,  TIDE2  requires  a  team  of  four  individuals  to  make 
the  classification.  In  order  to  accomplish  this,  three  of  the  team  members  are  “experts”  within  a 
given  identification,  while  the  fourth  member  is  the  team  leader  and  has  the  final  decision  on 
target  classification.  This  team  organization  of  experts  and  a  decision  maker  is  found  in  many 
fields,  and  is  the  reason  why  TIDE2  has  been  used  in  scenarios  that  resemble  Naval 
command/control  team  tasks,  personnel  selection  team  tasks,  and  medical  decision  making  team 
tasks  (to  name  a  few  applications).  The  key  to  the  TIDE2  team  task  is  that  no  individual  has  an 
understanding  of  all  of  the  characteristics  required  to  make  all  classifications  of  the  targets,  but 
among  all  of  the  team  members,  an  accurate  decision  can  be  reached. 

Although  many  different  scenarios  can  be  used  for  TIDE2,  we  will  discuss  the  Naval 
command/control  type  task,  since  that  is  most  relevant  to  the  military  community.  Each  target 
typically  has  six  characteristics  that  determine  the  threat  level  of  the  target.  Despite  the  fact  that 
all  of  the  characteristics  of  the  target  are  presented  to  all  four  team  members,  because  of 
differences  in  pre-task  training  only  “experts”  can  process  certain  characteristics  of  the  target 
(usually  one  third  of  the  information,  thus  two  characteristics)  and  therefore  all  experts’  opinions 
are  needed  to  make  an  accurate  decision.  All  team  members  are  stationed  at  physically  isolated 
computers  connected  by  a  network.  This  prevents  verbal  communication  and  allows  for  short 
messages  only  to  be  sent  from  one  team  member  to  another,  allowing  easy  recording  of 
communications  by  the  researchers.  The  task  begins  with  a  “target”  (such  as  an  unknown 
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aircraft)  entering  the  team’s  airspace.  One  target  is  presented  at  a  time  in  this  task.  A  timer 
indicates  how  long  the  team  has  to  designate  the  target’s  threat  level.  Each  team  member 
determines  the  target’s  threat  value  based  on  information  within  his/her  domain  of  expertise,  and 
then  relays  this  to  other  team  members  (typically  the  leader)  as  he/she  deems  fit.  Team  members 
may  also  request  infonnation  on  the  target.  Targets  can  either  contain  infonnation  that  clearly 
agrees  across  all  characteristics  (all  point  to  the  target  being  threatening),  can  be  ambiguous  (data 
falls  in  a  grey  zone  for  the  experts),  or  can  have  data  that  is  conflicting  (expert  one  detennines 
the  target  is  a  threat  while  experts  two  and  three  process  the  target  as  being  non- threatening). 
Before  the  timer  reaches  zero,  the  “commander”  must  rate  the  target’s  threat  level  on  a  seven- 
point  scale  (ranging  from  the  decision  that  the  target  can  be  safely  ignored  to  the  decision  that  a 
defensive  action  is  necessary).  Scoring  of  team  perfonnance  entails  giving  two  points  for 
accurately  diagnosing  the  target,  one  point  for  being  one  classification  away  from  the  correct 
target  classification  (selecting  a  two  on  the  threat  level  when  the  target  was  a  three),  zero  points 
for  being  two  classifications  away,  minus  one  point  for  being  three  classifications  away,  and 
minus  two  points  for  being  four  or  more  classifications  away.  Feedback  is  given  immediately 
after  the  final  decision  is  made.  If  a  decision  is  not  made  in  the  time  allotted,  the  target  is 
classified  as  missed  or  ignored. 

For  TIDE2,  experimenters  can  manipulate  the  time  that  a  team  has  available  to  make  the 
classification  of  the  target,  the  time  given  in  between  the  presentations  of  targets,  and  the 
agreement  between  the  team  members  in  the  task.  This  allows  researchers  to  create  high  stress 
scenarios  (little  time  to  make  a  decision,  rapidly  having  to  make  decisions,  or  experts  sending 
conflicting  information  to  the  leader)  that  can  lead  to  limited  communications  or  undervaluing 
certain  team  member’s  input.  However,  this  task  does  not  present  changing  information  since  all 
information  is  constant  with  regards  to  a  given  target. 

The  TIDE2  teamwork  task  requires  four  very  low-end  IBM  compatible  computers  with  at  least 
a  386  processor,  DOS  4.01  or  higher  capabilities,  and  basic  network  connections  to  each  other 
(the  main  computer  should  be  the  server).  It  should  still  be  possible  to  purchase  TIDE2  software 
by  contacting  the  original  authors  (Hollenbeck  et  ah,  1991),  who  are  associated  with  Michigan 
State  University.  The  software  was  not  expensive  in  1991,  selling  for  $25.00  at  that  time.  The 
authors  state  that  the  software  may  be  free  for  certain  organizations.  As  a  government-sponsored 
product  developed  originally  for  the  Office  of  Naval  Research  (Hollenbeck  et  al.),  it  may  be 
available  without  cost  for  legitimate  research  uses  by  government  personnel.  It  is  not  certain  if 
the  software  has  been  maintained  and  updated.  Inquiries  can  be  made  to  Dr.  John  Hollenbeck  at 
jrh@msu.edu. 

Summary  of  TIDE2 

a.  Recommended  reading:  Hollenbeck  et  al.  (1991);  Hollenbeck  et  al.  (1997). 

b.  Team  task:  Four  team  members  (leader,  etc.)  must  discover  intent  of  targets. 

c.  Scenario:  Naval  command/control. 
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d.  Factors  studied  include:  Judgment  accuracy;  policy  capturing  (detennining  which  aspects 
of  the  information  are  driving  the  ultimate  decision);  process  tracing  (the  information-seeking 
process  of  participants). 

e.  Strengths:  Military  relevant;  some  validation  work  has  been  done. 

f.  Potential  limitations:  Uncertain  if  software  is  available  commercially  or  has  been  updated. 

C3  (Command,  Control,  and  Communications)  Interactive  Task  for  Identifying  Emerging 
Situations  (NeoCITIES) 

The  NeoCITIES  task  was  developed  to  measure  team  performance,  communication,  and  team 
cognition  under  pressure.  The  inception  of  the  simulation/game  was  first  described  by  Wellens 
and  Ergener  (1988).  This  primary  article  described  a  simulation  task  that  allowed  the 
experimenter  to  manipulate  a  multitude  of  independent  variables  while  simultaneously  and 
automatically  recording  a  large  number  of  dependent  measures.  Over  time,  the  task  evolved  with 
improvements  in  technology  and  empirical  knowledge.  At  present,  the  task  employs  a  virtual  city 
in  a  crisis  management  scenario  requiring  response  from  emergency  services.  The  team 
members’  shared  goal  is  to  respond  appropriately  to  emergency  events  and  prevent  city- wide 
devastation  while  maintaining  civil  order.  A  total  of  two  to  three  teams  of  two  people  each  must 
communicate  and  work  cooperatively  to  achieve  these  shared  goals.  Each  pair  is  composed  of  an 
information  manager  and  a  resource  manager.  Communication  options  are  varied,  occurring  by 
means  of  interactive  touch-screens,  monitors,  microphones,  and  audio  conferencing. 
Measurement  of  communication  is  possible  between  teams  and  also  within  teams.  Each  team 
must  monitor  changing  events  as  well  as  the  allocation  of  resources.  The  quantitative  outcome 
variables  in  the  task  include  communication  frequency  and  type.  Data  are  recorded  electronically 
and  can  be  supplemented  and  enriched  by  means  of  a  full  system  structure  including  heart 
monitors  on  the  participants. 

The  same  researchers  at  Pennsylvania  State  University  who  developed  NeoCITIES  in  its 
present  form  are  moving  this  task  further  into  the  realm  of  modeling  of  dynamic  decision  making 
by  advancing  the  structural  components  of  the  scenario  employed.  For  example,  these 
researchers  are  developing  an  interactive  scenario  for  training  and  experimental  assessment  in 
NeoCITIES  (e.g.,  El-Nasr,  Jones,  and  McNeese,  2004). 

NeoCITIES  is  a  work  in  progress.  The  authors  of  this  review  were  unable  to  locate  specific 
information  concerning  test  properties  such  as  reliability  and  validity.  However,  it  should  be 
noted  that  the  face  validity  of  this  measure  appears  adequate.  The  test  is  not  specific  to  military 
applications,  but  it  has  relevance  to  some  military  operations.  While  some  of  the  measures  in  this 
task  are  automated,  the  non-automated  nature  of  the  audio/video  scoring  method  requires 
additional  equipment  and  makes  rapid,  accurate  analysis  of  communication  data  more 
challenging. 

Currently,  NeoCITIES  is  not  available  commercially,  because  it  is  a  tool  used  in  the  academic 
setting  and  the  developers  cannot  ensure  maintenance  and  updates  to  multiple  outside  users. 
However,  the  test  has  been  used  in  a  number  of  settings  and  interested  readers  should  contact  Dr. 
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Michael  McNeese  (MMcNeese@ist.psu.edu)  of  Pennsylvania  State  University  to  detennine 
whether  research  collaborations  are  possible.  Presently,  the  program  is  being  used  by  the 
Multidisciplinary  Initiatives  in  Naturalistic  Decision  Systems  (MINDS)  Group  at  Pennsylvania 
State  University  (http://minds.ist.psu.edu),  and  by  the  North-East  Visualization  and  Analytics 
Center,  which  is  involved  in  an  updated  version  called  NeoCITIES  Geo-Tools 
(www.  geovista  .psu.edu/resources/ flyers/NEV  AC_Thrust-4a_N  eoCITIES_final.pdf) . 

Summary  of  NeoCITIES 

a.  Recommended  reading:  Wellens  and  Ergener  (1988);  El-Nasr,  Jones,  and  McNeese 
(2004);  project  was  last  headed  by  McNeese  (McNeese  et  ah,  2005). 

b.  Team  task:  Monitoring  of  changing  events;  allocation  of  resources;  two  to  three  teams 
(fire,  police,  etc.)  of  two  members  each  (infonnation  manager,  resource  manager). 

c.  Scenario:  Medium  fidelity  metropolitan  crisis  control  center;  goal  is  to  respond  to 
emergency  events,  maintain  order,  and  prevent  city- wide  catastrophe  (e.g.,  due  to  terrorist 
attack). 

d.  Factors  studied:  Distributed  decision-making;  inference  accuracy  as  a  function  of  crisis 
tempo,  data  rate,  and  decision  complexity. 

e.  Strengths:  Allows  study  of  teams-of-teams;  has  good  measurement  capabilities;  was 
actively  used  and  updated  as  of  2008;  was  one  of  the  top-rated  platforms  (among  44)  reviewed 
by  Go,  Bos,  and  Lamoureux  (2006). 

f.  Potential  limitations:  Not  a  military  scenario  (but  fairly  applicable);  limited  information  on 
shared  knowledge;  not  all  measures  automated  appear  to  be  automated  (e.g.,  audio/video);  not 
available  commercially. 

Distributed  Dynamic  Decision  Making  (DDD) 

The  DDD  task  was  developed  by  Aptima,  Inc.  to  study  aspects  of  team  performance  including 
communication  in  a  complex  and  dynamic  environment  through  a  simulation  platfonn. 
Validation  efforts  have  been  underway  for  20  years  and  the  task  has  been  employed  by  military 
researchers  for  over  10  years  (Aptima,  2010).  DDD  is  marketed  as  a  research  tool  that  can  also 
be  used  for  training.  While  originally  designed  as  a  simulation  of  a  military  command/control 
environment,  the  task  can  be  tailored  to  other  contexts.  Workload,  infonnation  availability,  and 
team  structure  can  be  manipulated.  The  dependent  variables  measured  by  the  DDD  represent 
individual  and  team  performance  and  include,  but  are  not  limited  to,  “latency  to  process  a  task, 
accuracy  in  processing  a  task,  percentage  of  tasks  processed,  and  percentage  of  tasks  processed 
at  100%  accuracy”  (Baker  et  ah,  2004). 

More  than  a  dozen  publications  are  listed  on  the  Aptima  website,  which  references  and 
supports  the  DDD  task.  A  trial  version  of  the  program  is  available  for  free  download,  as  are 
Adobe©  Portable  Document  Format  (PDF)  versions  of  tutorials  and  details  of  the  specifications 
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and  requirements  to  use  the  DDD  task.  The  software  program  operates  using  Microsoft© 
Windows  XP,  and  data  is  logged  in  Extensible  Markup  Language  (XML)  format  which 
simplifies  the  process  of  exporting  to  third  party  programs  (e.g.,  for  statistical  analysis).  Users 
have  the  freedom  to  customize  their  own  scenarios  and  training  environments.  The  program 
appears  user-friendly  and  provides  multiple  resources  to  demonstrate  and  guide  installation  and 
configuration.  However,  the  testing  environment  is  minimally  realistic,  which  may  be  an 
important  consideration  for  certain  types  of  research. 

DDD  has  been  adapted  to  an  Internet-based  version  which  allows  participants  in  different 
locations  to  participate  in  the  same  mission  in  real-time.  Using  this  feature,  a  maximum  of  50 
participants  can  engage  in  the  same  mission  and  “chat”  using  private  or  broadcast  chat  groups,  e- 
mail  communication,  and  voice-communication.  It  should  be  noted  that  for  purposes  of  analysis, 
voice-communications  are  then  scored  by  the  researchers  rather  than  automatically.  The  DDD 
task  is  designed  to  be  compatible  with  other  Aptima  behavior  modeling  software  for  researchers 
who  aim  to  further  enrich  the  performance  measurement  capabilities  of  the  program.  Detailed 
review,  analysis,  and  discussion  of  the  DDD  task  are  provided  in  MacMillan  et  ah,  (2004). 
USAARL  recently  purchased  a  configuration  of  DDD  which  cost  under  $20,000  (government 
price),  but  understands  that  further  discounts  may  apply  for  academic  agencies.  For  a  DDD 
brochure,  trial  version,  or  price  quoting  infonnation,  see  http://www.aptima.com/products/ddd. 

Summary  of  DDD 

a.  Recommended  reading:  Galster,  Nelson,  and  Bolia  (2005). 

b.  Strengths:  Measures  automated;  can  easily  manipulate  task  load/demand  upon  users  (e.g., 
number,  type,  timing,  uncertainty  of  tasks);  can  manipulate  authority  levels,  communication,  and 
information  availability;  much  research  activity  over  more  than  a  decade. 

c.  Potential  limitations:  Small  team  size;  tests  fewer  aspects  of  cognitive  ability  than  some 
other  tests  (e.g.,  AEDGE);  no  “team-of-teams”  capability. 

Agent  Enabled  Decision  Group  Environment  (AEDGE) 

The  AEDGE  measure  of  team  decision  making  employs  a  command/control  scenario 
involving  the  weapons  director  team  of  an  AWACS.  This  scenario  embodies  the  core 
characteristics  of  a  command/control  environment  including  surveillance  and  communication.  In 
this  task,  participants  must  exchange,  interpret  and  weigh  infonnation  as  well  as  coordinate 
tactical  action  to  successfully  accomplish  overall  goals  of  the  task  (Go,  Bos,  and  Lamoureux, 
2006). 

AEDGE  is  a  Java-based  technology  which  was  developed  by  software  engineers  and 
researchers  at  21st  Century  Systems,  Inc.  for  training  and  performance  research  (Bames  et  ah, 
2002).  The  development  of  the  task  platform  involved  the  input  of  subject  matter  experts 
(SMEs),  focus  group  interviews,  and  cognitive  task  analyses  to  ensure  that  the  product  was 
representative  of  operational  settings.  The  task  involves  human  users  and  computer-generated 
agents  that  may  adopt  any  role  in  a  scenario.  Any  entity  (either  friendly  or  hostile)  not  controlled 
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by  a  human  is  controlled  by  the  computer.  A  computer  agent  will  make  recommendations  for  a 
course  of  action  which  the  human  may  or  may  not  choose  to  view.  Regardless  of  the  human’s 
performance  and  behavior,  the  system  logs  and  captures  the  agent’s  recommendations  thus 
allowing  direct  comparison  between  human  and  agent  with  respect  to  decision  making  and 
judgments.  This  enhances  the  system’s  ability  to  model  individual  and  team  performance.  An 
additional  benefit  to  the  utilization  of  the  agent-generated-action-recommendations  is  that  the 
system  may  manipulate  the  quantity  and  quality  (complexity)  so  as  to  vary  the  degree  of 
cognitive  workload.  The  experimenter  has  control  over  the  configuration  of  a  decision  aide  agent 
with  respect  to  decision  making  style  including,  but  not  limited  to,  the  degree  of  risk  acceptable, 
aggressiveness,  or  certainty.  Likewise,  the  experimenter  may  control  the  probability  of  a 
successful  decision  to  be  made  given  the  environment.  Dependent  measures  include 
communications  expressed  by  speech  as  it  is  captured  using  voice  generation  technology  and 
recordings.  Other  dependent  measures  are  not  as  clearly  stated  in  the  references  and  resources 
available.  However,  the  authors  infer  that  the  latency  and  accuracy  of  the  decisions  made  are 
recorded. 

This  system  seems  to  be  well-suited  for  the  researcher  interested  in  how  one  person’s 
decisions  are  influenced  by  the  decision  making  style  of  a  “partner”  in  a  team  scenario.  Many  of 
the  options  that  may  be  manipulated  by  the  experimenter  are  specific  to  the  type  of 
recommendations  made  by  the  agent-controlled  entity  (e.g.,  degree  of  directional  bias,  degree  of 
riskiness,  and  degree  of  certainty).  The  product  is  available  as  commercial  off-the-shelf  in  two 
variations  and  21st  Century  Systems,  Inc.,  offers  maintenance  and  support  services.  Inquiries  can 
be  made  to  awilson@21csi.com. 

Summary  of  AEDGE 

a.  Recommended  reading:  Bames  et  al.  (2002);  AEDGE  website, 
http://www.21csi.com/aedge. 

b.  Team  task:  High  fidelity  strategic/operational  task  -  command/control. 

c.  Scenario:  Weapons  director  roles  in  an  AWACS  system. 

d.  Factors  studied:  Individual  and  team  workload;  communication;  decision-making. 

e.  Strengths:  Has  voice  recognition  and  response;  can  monitor  and  vary  communication 
frequency  and  media;  can  examine  individual  performance  in  the  team  setting;  accommodates 
medium-size  teams;  AWACS  simulation  task  based  on  SMEs  and  task  analyses;  appears  readily 
available  and  actively  used. 

f.  Potential  limitations:  Limited  information  on  shared  knowledge;  possible  mix  of  computer 
and  observer  measures;  possible  complexity. 

Duo  Wondrous  Original  Method  Basic  Awareness/Airmanship  Test  (DuoWOMBAT) 
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The  memorably-named  DuoWOMBAT  is  a  modified  version  of  the  single-user  WOMBAT 
and  is  designed  to  extend  the  WOMBAT  to  measure  crew  coordination  and  shared  situation 
awareness.  Participants  must  work  cooperatively  to  accomplish  relatively  simple  shared  tasks. 
The  test  measures  performance  with  respect  to  divided  attention  among  multiple  infonnation 
sources,  judgment  of  priorities,  ability  to  estimate  probable  outcomes,  judgment  of  alternative 
courses  of  action,  and  divide  attention  among  tasks  of  varying  levels  of  urgency.  The 
DuoWOMBAT  provides  a  good  assessment  of  crew  resource  management  and  team 
coordination  for  establishing  good  situational  awareness,  however,  the  task  is  limited  to  two 
participants  and  quantifiable  measurement  of  communication  (an  important  aspect  of  team 
performance),  is  excluded  from  the  data  output. 

The  test  was  designed  to  simulate  practical  military  challenges,  such  as  the  need  for  team 
effectiveness  under  conditions  of  operational  stress  (Breton,  Tremblay,  and  Banbury,  2007).  The 
measure  shows  good  test-retest  reliability  and  predictive  validity  (Roscoe,  Corl,  and  LaRoche, 
2001).  Participants  are  seated  side  by  side  at  two  WOMBAT  consoles  separated  by  a  partition 
which  increases  the  communication  effort  made  by  the  team  members.  The  DuoWOMBAT  is 
composed  of  a  primary  target  tracking  task  and  three  secondary  tasks:  figure-rotation,  quadrant- 
location,  and  digit-cancelling.  The  constructs  measured  include  target  tracking,  pattern 
recognition,  spatial  orientation,  and  short  tenn  memory.  Participants  are  presented  with  tasks 
individually  and  in  dual  testing  phases  (Breton,  Tremblay,  and  Banbury).  In  the  target  tracking 
task,  participants  must  maintain  two  vertical  lines  on  either  side  of  a  moving  hexagon  using  their 
left  hand  while  maintaining  a  cross  centered  inside  a  moving  circle  with  their  right  hand.  The 
figure-rotation  task  requires  the  team  members  to  work  together  to  decide  if  two  figures  are 
identical,  mirror  images,  or  different.  Each  team  member  controls  the  rotation  of  one  of  the 
figures.  In  the  quadrant-location  task,  numbers  appear  in  groups  of  eight  in  the  four  quadrants  of 
the  display  and  participants  must  cancel  the  numbers  in  sequence  by  pressing  the  key  that 
corresponds  to  the  quadrant  in  which  the  number  lies.  Finally,  in  the  digit-cancelling  task,  single 
digits  are  displayed  sequentially  inside  a  square.  Once  the  third  digit  is  displayed,  participants 
must  begin  “cancelling”  the  digits,  starting  with  the  first  digit  displayed,  by  pressing  the  digits  on 
the  keyboard  in  the  order  that  they  were  shown  (Breton,  Tremblay,  and  Banbury). 

Both  tests,  the  WOMBAT  and  DuoWOMBAT,  were  developed  by  Drs.  Jean  LaRoche  and 
Stanley  Roscoe  at  Aero  Innovation,  Inc.  Price  quotes  concerning  the  equipment  and  software  to 
utilize  this  system  are  available  on  the  Aero  Innovation,  Inc.  website  of  (www.aero.ca).  A  2010 
price  list  is  shown  at  www.aero.ca/e_W_prices_CS.html.  To  run  the  DuoWOMBAT,  two 
WOMBAT  stations  are  required;  however,  the  second  copy  of  the  software  is  free  of  charge. 
USAARL  received  a  price  quote  of  approximately  $60,000  to  upgrade  its  complete  (c.  2000) 
system  to  201 1  standards.  Prices  are  subject  to  change  and  an  official  quotation  must  be 
requested. 

The  DuoWOMBAT  provides  a  good  assessment  of  crew  resource  management  and  team 
coordination  for  establishing  good  situational  awareness.  However,  the  task  is  limited  to  two 
participants,  the  interface  is  out-of-date  in  terms  of  technological  display,  and  quantifiable 
measurement  of  communication,  an  important  aspect  of  team  performance,  is  excluded  from  the 
data  output.  It  is  important  for  the  experimenter  to  carefully  weigh  these  factors  relative  to 
his/her  specific  research  goals. 
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Note:  The  WOMBAT  and  DuoWOMBAT  were  purchased  c.  2000  by  USAARL  for 
discretionary  use  in  its  research  programs.  The  decision  to  purchase  these  tests  was  not  made  by 
the  authors  of  this  review,  who  were  not  working  at  USAARL  at  the  time  the  tests  were 
purchased.  Moreover,  USAARL  researchers  are  not  required  to  use  the  DuoWOMBAT. 
Therefore,  the  authors  perceive  no  conflict  of  interest  or  bias  deriving  from  the  fact  that 
USAARL  owned  this  test  prior  to  the  execution  of  this  review  by  the  authors. 

Summary  of  DuoWOMBAT 

a.  Recommended  readings:  Roscoe  (1993);  Roscoe  (1997);  Roscoe,  Corl,  and  LaRoche 
(2001);  Odle-Dusseau,  Bradley,  and  Pilcher  (2010). 

b.  Team  task:  Low  fidelity  multitasking,  some  information  processing  and  some  minor 
psychomotor  aspects. 

c.  Scenario:  Shared  situation  awareness  and  crew  coordination  (e.g.,  aircrew). 

d.  Factors  studied:  Attention  to  multiple  infonnation  sources;  evaluation  of  alternatives; 
establishment  of  priorities;  estimation  of  probable  outcomes  of  actions. 

e.  Specific  measures:  Include  shared  performance  on  tasks  involving  target  tracking 
(attention  and  psychomotor  ability);  figure  rotation  (spatial  ability);  quadrant  location  (pattern 
recognition);  and/or  digit  cancellation  (working  memory). 

f.  Strengths:  Open  source;  only  test  directly  relevant  to  shared  display/control  (rather  than 
strategic  command/control);  has  been  widely  used  at  various  sites  and  seems  stable  and  reliable; 
tasks  well  specified  in  regards  to  accepted  cognitive  abilities  recognized  in  neuropsychology. 

g.  Potential  limitations:  Simplistic  interface  -  tests  basic  cognitive  abilities;  not  clear  if  it  can 
capture  communication  (e.g.,  content,  frequency);  only  appears  configured  for  two  users;  does 
not  have  “team  of  teams”  capability;  not  applicable  to  strategic  command/control;  relatively  high 
cost  for  certain  configurations  (-$60,000). 

New  tests  in  development 

After  the  collection  of  the  literature  for  this  review  but  prior  to  the  submission  of  this  report, 
the  first  author  identified  a  few  new  preliminary  team  performance  reports  (Lum,  Sims,  and 
Salas,  2011;  Wiese,  Pavlas,  and  Fiore,  2011).  These  team  performance  experiments  did  not 
directly  exploit  the  existing  tests  in  this  review,  nor  were  the  experiments  primarily  intended  to 
develop  new  computerized  team  perfonnance  test  batteries.  However,  one  new  computerized  test 
of  team  perfonnance  was  identified  that  is  based  on  the  C3Fire  test  (www.c3fire.org)  listed  in 
this  review;  it  is  called  the  C3Conflict  (Smith,  2011;  Granlund,  Smith,  and  Granlund,  2011).  Full 
evaluation  of  these  new  reports  was  not  completed  prior  to  the  submission  of  this  report,  but  a 
preliminary  look  at  C3Conflict  suggests  that  it  has  many  desirable  features  which  would  fit  the 
inclusion  criteria  of  this  review  and  that  further  attention  is  warranted  as  this  test  matures.  While 
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this  test  is  not  as  developed  as  the  “final  seven”  recommended  in  this  review,  it  is  mentioned 
here  because  it  is  designed  for  military  applications  and  its  very  recent  vintage  ensures  that  it  is 
being  actively  used  and  will  work  with  the  latest  computer  systems. 


Conclusions 


Of  54  potential  team  performance  tests  identified  in  this  review,  the  seven  tests  deemed  of 
greatest  interest  for  military  research  were  (in  no  order  of  preference):  TANDEM,  TP  AT,  TIDE”, 
C3,  NeoCITIES,  DDD,  AEDGE,  and  Duo  WOMBAT.  NeoCITIES  is  the  only  test  not  designed 
specifically  for  military  applications,  but  it  was  included  because  it  has  many  desirable  features 
and  it  is  suitable  for  paramilitary  (e.g.,  police)  situations  and  for  scenarios  relevant  to  national 
defense,  such  as  simulating  a  coordinated  emergency  response  to  terrorist  attacks  on  civilian 
centers.  The  test  most  similar  to  the  rudimentary  aspects  of  flight  control  tasks  engaged  in  by 
military  aviation  crewmembers  is  the  Duo  WOMB  AT.  The  other  six  tests  focused  on  various 
aspects  of  team  performance  most  relevant  to  command/control  situations,  such  as  handling 
threats  and  allocating  resources.  Among  these  seven  tests,  the  ones  which  were  judged  as  most 
likely  to  be  relevant,  readily  available,  widely/recently  used,  and  relatively  mature  in  terms  of 
validation  include  NeoCITIES,  DDD,  and  DuoWOMBAT.  Of  these,  the  DDD  (purchased  after 
review)  and  DuoWOMBAT  (purchased  before  review)  are  available  for  purchase  and  USAARL 
has  chosen  to  obtain  a  copy  of  each.  In  addition,  a  new  test  was  identified  (called  C3Conflict) 
after  the  completion  of  literature  gathering  for  this  review.  It  is  mentioned  here  because  it 
appears  to  have  many  desirable  features  and  should  be  considered  further  as  additional 
development  and  validation  is  completed. 


Recommendations 


Researchers  studying  military  or  paramilitary  team  performance  should  consider  the 
information  in  this  report  when  seeking  to  identify  the  tests  most  appropriate  to  the  specific 
needs  of  the  scientific  effort  being  planned.  Further  use,  refinement,  validation,  and  comparison 
of  the  existing  automated  group  perfonnance  measures  are  encouraged. 

This  report  identified  seven  existing  tests  which  appeared  most  applicable  to  military  research. 
In  addition,  the  original  list  of  more  than  50  potential  tests  of  interest  is  provided,  since  some  of 
these  other  tests  may  have  specific  features  of  importance  to  a  given  experiment.  For  example, 
the  DuoWOMBAT  is  suitable  for  an  experimental  scenario  emphasizing  shared  tactical 
display/control  or  shared  situation  awareness  (e.g.,  crew  coordination  in  the  cockpit),  while  the 
other  six  tests  described  are  more  appropriate  for  command/control  scenarios. 

While  no  single  test  will  be  applicable  to  all  situations,  there  are  obvious  drawbacks  to 
publications  exploring  similar  themes  in  team  research  separately  via  dozens  of  different  team 
performance  tests.  Single-study,  experiment-specific  research  considerations  should  be  balanced 
against  the  multi-study  benefits  of  focusing  team  research  more  narrowly  on  a  few  key  tests  used 
across  several  laboratories.  Such  benefits  include  improved  ability  for  comparison  of  findings 
across  multiples  studies.  At  least  within  the  restricted  realm  of  command/control  tasks,  it  appears 
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possible  to  narrow  the  field  of  tests  considerably.  Unfortunately,  it  is  not  clear  whether  each  of 
the  final  six  command/control  tasks  deemed  to  be  of  greatest  interest  during  this  review  (viz., 
TANDEM,  TP  AT,  TIDE2,  NeoCITIES,  DDD,  and  AEDGE)  are  available  for  immediate  use  or 
purchase.  It  appears  that  relatively  few  of  the  tests  of  team  perfonnance  make  a  successful 
transition  as  easily-obtained  or  commercial  off-the-shelf  applications.  This  may  lead  to  a 
situation  where  new  tests  continue  to  be  developed  rather  than  existing  tests  being  improved, 
validated,  and  compared.  Some  of  the  most  important  practical  questions  about  team 
performance  measurement  will  not  be  answered  efficiently  by  the  continued  introduction  of  new 
measures  of  team  perfonnance.  Empirical  experience  with,  and  refinement  of,  existing  team 
performance  tests  is  needed.  Until  further  validation  and  a  head-to-head  test  comparison  is  done, 
a  researcher’s  choice  of  which  test  to  use  will  tend  to  be  driven  less  by  the  quality  of  the  test’s 
scientific  properties  than  by  logistical  or  psychological  factors,  such  as  the  test’s  cost, 
availability,  ease  of  data  administration  and  analysis,  perceived  “realism,”  perceived  relevance  to 
the  mission  or  agency,  novelty,  or  place  of  development  (e.g.,  whether  the  test  was  “invented 
here”). 
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